machine learning

您所在的位置:网站首页 measurement error bias machine learning

machine learning

2023-03-21 02:36| 来源: 网络整理| 查看: 265

The term error appears in several related (but not identical) contexts throughout science in general and statistical science in particular.

Error still carries the flavour of mistake (something erroneous), at least in the context of measurement error and particularly when scientists are thinking about their data. But its primary meaning in statistical science has long since been simply that of more or less uncontrolled variation (something erratic or errant). Sampling error, for example, refers to sampling variation, the uncontrolled and uncontrollable fact that different samples, responsibly taken, will include different data; hence in general any statistics (such as means, correlations, fraction blue) based on those samples will differ from sample to sample.

In simple regression-type models, error refers to individual disturbances in specifications such as

response variable $=$ function of predictors $+$ stochastic error

and error can refer more generally to the conditional distribution of the response variables given the predictors.

Bias refers to the difference between the true or correct value of some quantity and a measurement or estimate of that quantity. In principle it cannot be calculated therefore unless that true or correct value is known, although this problem bites to varying degrees.

In the simplest kind of problem, the true value is known (as when the centre of a target is visible and the distance of a shot from the centre can be measured; this is a common analogy) and bias is then usually calculated as the difference between the true value and the mean (or occasionally some other summary) of measurements or estimates.

In other problems, some careful method is regarded as the state of the art and so yielding the best possible measurements, and so other methods are regarded as more or less biased according to their degree of systematic departure from the best method (in some fields termed a gold standard).

In yet other problems, we have one or more methods all deficient to some degree and assessment of bias is then difficult or impossible. It is then tempting, or possibly even natural, to change the question and judge truth according to consistency between methods.

The two terminologies can be made consistent with the idea that systematic measurement errors have non-zero means (hence their summary quantifies bias) and random errors have zero mean. (Equivalently, that is how we label error as systematic or random.)

In mathematical statistics, standard analyses analyse whether particular estimators are biased in small samples, asymptotically, etc,, either in general or under particular circumstances.

This sketch at times implies that error is defined additively, so that

measured value $=$ true value $+$ error

but that is just the simplest situation. Nothing here rules out the idea that error may be multiplicative rather than additive, or defined on more complicated scales (e.g. in measuring proportions or percents, error may be better considered on something like a logit scale).

Comments on erroneous and erratic here were inspired by discussions in Jeffreys, Harold. 1939/1948/1961. Theory of probability. London: Oxford University Press.



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3